Automating the Analysis HIV Immunogen Antigenic Characteristics

(Alt Title: I'm getting lazy)

Michael Chambers

What I Do: A lotta quality control for HIV Immunogens

Objective: Automate the data analysis for these immunogens

My Goals:

  1. Convert MSD output file to .csv
  2. Break up the .csv into 8x12 arrays
  3. Average column duplicates and create graph
  4. Import raw data into Prism for further analysis

What I made:

  1. A script that accomplishes ALMOST all of the above
  2. A rudimentary module to easily manipulate the data from each plate

DEMO TIME!


In [1]:
#This is the MSD .txt output file I'll be working with:
data = open('data.txt')
data.read()


Out[1]:
'==========Data==================================================================================================================================================================================================\n\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\nA\t963268\t965546\t934816\t965927\t9476\t9728\t75679\t75719\t58511\t57280\t189749\t186988\nB\t973919\t976761\t940526\t939458\t5534\t5500\t37403\t37522\t29628\t28250\t91300\t94668\nC\t907480\t912649\t875027\t873084\t3671\t3950\t20350\t20372\t16409\t16225\t51356\t51729\nD\t756367\t762278\t723299\t722686\t3134\t2984\t10568\t10621\t8535\t8338\t26760\t27257\nE\t465322\t468026\t429960\t437096\t2602\t331\t5604\t5873\t4866\t4973\t14761\t14776\nF\t244089\t250824\t234123\t237409\t2071\t2177\t2944\t3083\t2874\t2834\t7214\t7122\nG\t134690\t139305\t128528\t133833\t2030\t2035\t1663\t1759\t1688\t1724\t3949\t3798\nH\t86\t89\t165\t207\t1621\t1640\t97\t100\t187\t568\t126\t126\n\n==========Data==================================================================================================================================================================================================\n\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\nA\t146106\t148267\t1008\t1007\t1112\t1183\t2666\t2648\t3053\t3033\t4916\t4792\nB\t109212\t107029\t557\t623\t1056\t1051\t1322\t1344\t1690\t1744\t2362\t2405\nC\t71341\t67961\t404\t402\t1129\t1331\t762\t783\t1130\t1140\t1313\t1286\nD\t38312\t35936\t232\t319\t1323\t1192\t418\t453\t826\t825\t718\t750\nE\t16283\t16196\t277\t285\t1205\t1254\t267\t287\t667\t670\t439\t446\nF\t6311\t6260\t249\t258\t1336\t1260\t185\t188\t595\t597\t268\t274\nG\t3023\t2910\t256\t260\t1399\t1335\t149\t155\t532\t560\t205\t200\nH\t155\t160\t228\t254\t1319\t1259\t95\t99\t500\t522\t127\t130\n'

(1) Demo Script: msd_script.py

(2) Demo Module: msd_module.py


In [2]:
#Import msd_module
%matplotlib inline
import msd_module


msd_module imported!

In [3]:
#Check Docstring
msd_module?

In [4]:
project1 = msd_module.msd_96()


Input date (e.g. yymmdd): 160511
Input project name (e.g. my_project): Project2

In [6]:
project1.create_df('data.txt')


df created!

In [7]:
project1.df


Out[7]:
Rows 1 2 3 4 5 6 7 8 9 10 11 12
0 A 963268 965546 934816 965927 9476 9728 75679 75719 58511 57280 189749 186988
1 B 973919 976761 940526 939458 5534 5500 37403 37522 29628 28250 91300 94668
2 C 907480 912649 875027 873084 3671 3950 20350 20372 16409 16225 51356 51729
3 D 756367 762278 723299 722686 3134 2984 10568 10621 8535 8338 26760 27257
4 E 465322 468026 429960 437096 2602 331 5604 5873 4866 4973 14761 14776
5 F 244089 250824 234123 237409 2071 2177 2944 3083 2874 2834 7214 7122
6 G 134690 139305 128528 133833 2030 2035 1663 1759 1688 1724 3949 3798
7 H 86 89 165 207 1621 1640 97 100 187 568 126 126
8 A 146106 148267 1008 1007 1112 1183 2666 2648 3053 3033 4916 4792
9 B 109212 107029 557 623 1056 1051 1322 1344 1690 1744 2362 2405
10 C 71341 67961 404 402 1129 1331 762 783 1130 1140 1313 1286
11 D 38312 35936 232 319 1323 1192 418 453 826 825 718 750
12 E 16283 16196 277 285 1205 1254 267 287 667 670 439 446
13 F 6311 6260 249 258 1336 1260 185 188 595 597 268 274
14 G 3023 2910 256 260 1399 1335 149 155 532 560 205 200
15 H 155 160 228 254 1319 1259 95 99 500 522 127 130

In [8]:
project1.split_plates()


Plate#_1
  row  dilution         1         2       3        4        5         6
0   A  5.000000  964407.0  950371.5  9602.0  75699.0  57895.5  188368.5
1   B  2.500000  975340.0  939992.0  5517.0  37462.5  28939.0   92984.0
2   C  1.250000  910064.5  874055.5  3810.5  20361.0  16317.0   51542.5
3   D  0.625000  759322.5  722992.5  3059.0  10594.5   8436.5   27008.5
4   E  0.312500  466674.0  433528.0  1466.5   5738.5   4919.5   14768.5
5   F  0.156250  247456.5  235766.0  2124.0   3013.5   2854.0    7168.0
6   G  0.078125  136997.5  131180.5  2032.5   1711.0   1706.0    3873.5
7   H  0.039062      87.5     186.0  1630.5     98.5    377.5     126.0
Axes(0.125,0.125;0.775x0.775)
Plate#_2
   row  dilution         1       2       3       4       5       6
8    A  5.000000  147186.5  1007.5  1147.5  2657.0  3043.0  4854.0
9    B  2.500000  108120.5   590.0  1053.5  1333.0  1717.0  2383.5
10   C  1.250000   69651.0   403.0  1230.0   772.5  1135.0  1299.5
11   D  0.625000   37124.0   275.5  1257.5   435.5   825.5   734.0
12   E  0.312500   16239.5   281.0  1229.5   277.0   668.5   442.5
13   F  0.156250    6285.5   253.5  1298.0   186.5   596.0   271.0
14   G  0.078125    2966.5   258.0  1367.0   152.0   546.0   202.5
15   H  0.039062     157.5   241.0  1289.0    97.0   511.0   128.5
Axes(0.125,0.125;0.775x0.775)

In [9]:
project1.dilution


Out[9]:
[5, 2.5, 1.25, 0.625, 0.3125, 0.15625, 0.078125, 0.0390625]

In [10]:
project1.create_dilution(5,3,8)

In [11]:
project1.dilution


Out[11]:
[5,
 1.6666666666666667,
 0.5555555555555556,
 0.1851851851851852,
 0.0617283950617284,
 0.0205761316872428,
 0.006858710562414266,
 0.0022862368541380885]

In [12]:
project1.split_plates()


Plate#_1
  row  dilution         1         2       3        4        5         6
0   A  5.000000  964407.0  950371.5  9602.0  75699.0  57895.5  188368.5
1   B  1.666667  975340.0  939992.0  5517.0  37462.5  28939.0   92984.0
2   C  0.555556  910064.5  874055.5  3810.5  20361.0  16317.0   51542.5
3   D  0.185185  759322.5  722992.5  3059.0  10594.5   8436.5   27008.5
4   E  0.061728  466674.0  433528.0  1466.5   5738.5   4919.5   14768.5
5   F  0.020576  247456.5  235766.0  2124.0   3013.5   2854.0    7168.0
6   G  0.006859  136997.5  131180.5  2032.5   1711.0   1706.0    3873.5
7   H  0.002286      87.5     186.0  1630.5     98.5    377.5     126.0
Axes(0.125,0.125;0.775x0.775)
Plate#_2
   row  dilution         1       2       3       4       5       6
8    A  5.000000  147186.5  1007.5  1147.5  2657.0  3043.0  4854.0
9    B  1.666667  108120.5   590.0  1053.5  1333.0  1717.0  2383.5
10   C  0.555556   69651.0   403.0  1230.0   772.5  1135.0  1299.5
11   D  0.185185   37124.0   275.5  1257.5   435.5   825.5   734.0
12   E  0.061728   16239.5   281.0  1229.5   277.0   668.5   442.5
13   F  0.020576    6285.5   253.5  1298.0   186.5   596.0   271.0
14   G  0.006859    2966.5   258.0  1367.0   152.0   546.0   202.5
15   H  0.002286     157.5   241.0  1289.0    97.0   511.0   128.5
Axes(0.125,0.125;0.775x0.775)

That's it!

So what's coming in msd_module VERSION 2.0!!!

(more like version 0.0.2)
  • Make it pretty
  • Interpret more .txt file types
  • Expand on module functions
  • Write data in XML and import into Prism for further analysis

Is it on GitHub? HELL YEAH!!!

https://github.com/greenkidneybean/MSD_Module

How much time do these tools save me?

For right now I'll just say I'm in the red.

Cheers!

~mc